智能论文笔记

Growing Instance Mask on Leaf

Chuang Yang , Haozhao Ma , Qi Wang

分类：计算机视觉 | 人工智能

2022-11-30

Contour-based instance segmentation methods include one-stage and multi-stage schemes. These approaches achieve remarkable performance. However, they have to define plenty of points to segment precise masks, which leads to high complexity. We follow this issue and present a single-shot method, called \textbf{VeinMask}, for achieving competitive performance in low design complexity. Concretely, we observe that the leaf locates coarse margins via major veins and grows minor veins to refine twisty parts, which makes it possible to cover any objects accurately. Meanwhile, major and minor veins share the same growth mode, which avoids modeling them separately and ensures model simplicity. Considering the superiorities above, we propose VeinMask to formulate the instance segmentation problem as the simulation of the vein growth process and to predict the major and minor veins in polar coordinates. Besides, centroidness is introduced for instance segmentation tasks to help suppress low-quality instances. Furthermore, a surroundings cross-correlation sensitive (SCCS) module is designed to enhance the feature expression by utilizing the surroundings of each pixel. Additionally, a Residual IoU (R-IoU) loss is formulated to supervise the regression tasks of major and minor veins effectively. Experiments demonstrate that VeinMask performs much better than other contour-based methods in low design complexity. Particularly, our method outperforms existing one-stage contour-based methods on the COCO dataset with almost half the design complexity.

translated by 谷歌翻译

Parameterized Knowledge Transfer for Personalized Federated Learning

Jie Zhang , Song Guo , Xiaosong Ma , Haozhao Wang , Wencao Xu , Feijie Wu

分类：机器学习

2021-11-04

近年来，个性化联邦学习（PFL）引起了越来越关注其在客户之间处理统计异质性的潜力。然而，最先进的PFL方法依赖于服务器端的模型参数聚合，这需要所有模型具有相同的结构和大小，因此限制了应用程序以实现更多异构场景。要处理此类模型限制，我们利用异构模型设置的潜力，并提出了一种新颖的培训框架，为不同客户使用个性化模型。具体而言，我们将原始PFL中的聚合过程分为个性化组知识转移训练算法，即KT-PFL，这使得每个客户端能够在服务器端维护个性化软预测以指导其他人的本地培训。 KT-PFL通过使用知识系数矩阵的所有本地软预测的线性组合更新每个客户端的个性化软预测，这可以自适应地加强拥有类似数据分布的客户端之间的协作。此外，为了量化每个客户对他人的个性化培训的贡献，知识系数矩阵是参数化的，以便可以与模型同时培训。知识系数矩阵和模型参数在每轮梯度下降方式之后的每一轮中可替代地更新。在不同的设置（异构模型和数据分布）下进行各种数据集（EMNIST，Fashion \ _Mnist，CIFAR-10）的广泛实验。据证明，所提出的框架是第一个通过参数化群体知识转移实现个性化模型培训的联邦学习范例，同时实现与最先进的算法比较的显着性能增益。

translated by 谷歌翻译

Sensitivity analysis of biological washout and depth selection for a machine learning based dose verification framework in proton therapy

Shixiong Yu , Yuxiang Liu , Zongsheng Hu , Haozhao Zhang , Pengyu Qi , Hao Peng

分类：机器学习

2022-12-21

Dose verification based on proton-induced positron emitters is a promising quality assurance tool and may leverage the strength of artificial intelligence. To move a step closer towards practical application, the sensitivity analysis of two factors needs to be performed: biological washout and depth selection. selection. A bi-directional recurrent neural network (RNN) model was developed. The training dataset was generated based upon a CT image-based phantom (abdomen region) and multiple beam energies/pathways, using Monte-Carlo simulation (1 mm spatial resolution, no biological washout). For the modeling of biological washout, a simplified analytical model was applied to change raw activity profiles over a period of 5 minutes, incorporating both physical decay and biological washout. For the study of depth selection (a challenge linked to multi field/angle irradiation), truncations were applied at different window lengths (100, 125, 150 mm) to raw activity profiles. Finally, the performance of a worst-case scenario was examined by combining both factors (depth selection: 125 mm, biological washout: 5 mins). The accuracy was quantitatively evaluated in terms of range uncertainty, mean absolute error (MAE) and mean relative errors (MRE). Our proposed AI framework shows good immunity to the perturbation associated with two factors. The detection of proton-induced positron emitters, combined with machine learning, has great potential to implement online patient-specific verification in proton therapy.

translated by 谷歌翻译

FR: Folded Rationalization with a Unified Encoder

Wei Liu , Haozhao Wang , Jun Wang , Ruixuan Li , Chao Yue , Yuankai Zhang

分类：机器学习 | 自然语言处理

2022-09-17

常规作品通常采用两阶段模型，其中生成器选择最重要的部分，然后是根据所选零件进行预测的预测因子。但是，这样的两相模型可能会引起变性问题，其中预测变量过度适合尚未训练的发电机生成的噪声，然后导致发电机收敛到倾向于选择无意义的碎片的亚最佳模型。为了应对这一挑战，我们提出了折叠的合理化（FR），将理由模型的两个阶段折叠成一个文本语义提取的角度。FR的关键思想是在发电机和预测器之间采用统一的编码器，基于FR可以通过访问传统两相模型中发电机阻止的有价值的信息来促进更好的预测指标，从而带来更好的生成器。从经验上讲，我们表明，与最先进的方法相比，FR将F1得分提高了10.3％。

translated by 谷歌翻译

From Deterioration to Acceleration: A Calibration Approach to Rehabilitating Step Asynchronism in Federated Optimization

Feijie Wu , Song Guo , Haozhao Wang , Zhihao Qu , Haobo Zhang , Jie Zhang , Ziming Liu

分类：机器学习

2021-12-17

在联合优化的设置中，在周期性地聚合全局模型的情况下，当参与者通过完全利用其计算资源进行模型训练时，将发生步骤异步。很好地承认，在非i.i.d下，STEP异步导致客观不一致。数据，降低了模型精度。为了解决这个问题，我们提出了一种新的算法\ texttt {fedagrac}，它将本地方向校准到预测的全球方向。采取估计取向的优势，我们保证，聚合模型不会过度偏离预期的方向，同时充分利用更快的节点的本地更新。理论上，我们证明\ texttt {fedagrac}保持比最先进的方法的收敛速度提高，并消除了步骤异步的负效应。经验结果表明，我们的算法加速了培训并增强了最终的准确性。

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

Haoyu Ma , Xiangru Lin , Yizhou Yu

分类：计算机视觉

2023-01-03

Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.

translated by 谷歌翻译

KoopmanLab: A PyTorch module of Koopman neural operator family for solving partial differential equations

Wei Xiong , Muyuan Ma , Pei Sun , Yang Tian

分类：机器学习

2023-01-03

Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

A New Perspective to Boost Vision Transformer for Medical Image Classification

Yuexiang Li , Yawen Huang , Nanjun He , Kai Ma , Yefeng Zheng

分类：计算机视觉 | 人工智能

2023-01-03

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

translated by 谷歌翻译